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A foundation is investigated for the application of loosely structured data on the Web. This area is 
often referred to as Linked Data, due to the use of URIs in data to establish links. This work focuses 
on emerging W3C standards which specify query languages for Linked Data. The approach is to 
provide an abstract syntax to capture Linked Data structures and queries, which are then internalised 
in a process calculus. An operational semantics for the calculus specifies how queries, data and 
processes interact. A labelled transition system is shown to be sound with respect to the operational 
semantics. Bisimulation over the labelled transition system is used to verify an algebra over queries. 
The derived algebra is a contribution to the application domain. For instance, the algebra may be 
used to rewrite a query to optimise its distribution across a cluster of servers. The framework used to 
provide the operational semantics is powerful enough to model related calculi for the Web. 

1 Introduction 

The application of interest is a powerful emerging idea commonly referred to as the Web of Data (H. 
The Web of Data marks a shift from publishing documents to publishing data. The Web is based on 
documents which contain links to other documents. The Web of Data is concerned with resources more 
general than documents. Data on the Web contains links to resources described in multiple data sources. 
In both the case of the Web and the Web of Data the links between documents and resources, respectively, 
are established by a standardised global naming system — the URL On the Web, URIs allow documents 
in distributed locations with distinct ownership to refer to each other. Similarly, in a Web of Data, URIs 
allow data in distributed locations with distinct ownership to refer to common resources. 

Suppose that the URIs are not used as a standard naming system. In this case, each data source uses 
its own naming system. Typically, in this case each data source is disjoint, hence traditional database 
techniques may be applied. This is referred to as closed world system, since the boundaries of the data 
source are known. For instance, classical negation can be used to determine whether some data does not 
appear in a data source, and schemata can constrain the structure of data. 

In contrast, the presence of URIs as a global naming system, enables an open world system. In an 
open world system a variety of protocols can be used to obtain data from multiple sources based on the 
URIs which appear. For instance, a request may be sent to a URI to directly obtain some data about that 
URL Alternatively, services may be used to find data relevant to a URI. In this open world setting, there is 
no guarantee that mechanisms find all relevant data. There may always be data not known locally which 
refers to a resource; hence in general optimal query results cannot be obtained and classical negation 
cannot be applied. Another restriction in an open world system is that schemata which constrain data 
cannot be enforced globally. 

A light semi-structured data format must be agreed for the Web of Data. The W3C recommends the 
Resource Description Framework (RDF) as a general format for presenting data [16]. RDF is based on 
triples which consist of a subject, predicate and object. The subject, predicate and object are all named by 
URIs. Each URI in a triple may represent resources in different locations, hence a triple links locations. 
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Other semi-structured data formats contain URIs, such as feeds. RDF is intended as a minimal data 
format to which other formats can be lifted. 

Assuming that Linked Data can be gathered, observations about Linked Data can be made. The W3C 
recommendation is to use SPARQL Queries to make such observations [24]. In this work, to model this 
scenario, both RDF Data and SPARQL Queries are internalised in a process calculus. The operational 
semantics of the process calculus specifies how queries and data interact, to realise the W3C recommen- 
dations. The operational semantics are realistic since there is no guarantee of maximal responses, only 
that responses are correct. 

Two SPARQL Queries may be indistinguishable with respect to their operational behaviour. Such 
operationally equivalent queries are bisimilar. In this work, bisimulation is used to derive an algebra over 
SPARQL Queries. The algebra agrees with expected equivalences analogous to those uncovered by re- 
lational algebra and exposes some new equivalences. The derived algebra can be used to rewrite a query 
to a normal form. Normal forms are useful for optimisation purposes. A query can be optimised before 
being distributed over multiple data sources. Distribution of queries is a key challenge for enabling a 

Web of Data El- 
Section |2] presents a syntax and semantics for RDF triples, SPARQL queries and processes which 
internalise both triples and queries. Section [3] provides an alternative operational semantics using a 
labelled transition system. The labelled transition system is proven to be sound with respect to the 
reduction system. Section [4] introduces two notions of equivalence over the calculus, which correspond 
to the two operational semantics. Bisimulation for the labelled transition system is proven to be complete 
with respect to contextual equivalence for the reduction system. An algebra for queries is verified using 
bisimulation. 

2 A syntax and semantics for the syndication calculus 

The concrete syntax for both RDF and SPARQL Query are specified in W3C recommendations lfT6ll24ll . 
Here an abstract syntax is presented to model the core features of the concrete syntax. This abstract 
syntax is easier to define than the concrete syntax, which is sugared to make programming easier. 

The operational semantics of the calculus is specified as a reduction system. The syntax and rules 
of the reduction system borrow from a fragment of Linear Logic, extended with a continuation. Related 
work has investigated other approaches to using Linear Logic for both query languages and process 
calculi El ED [1. 

Note that the description of the syntax and reduction system is brief. A similar syntax and reduction 
system are extensively discussed in the thesis of the first author [14]. The main contribution of this paper 
is the bisimulation results for queries. 

2.1 A syntax for RDF triples 

An abstract syntax for triples conveys the RDF data format. The atoms of the syntax are names and 
literals. Names represent occurrences of URIs, which are represented by identifiers in italics, such as 
Johnor knows. Literals are basic data values, such as the strings 'Paul' or '77-3426'. The definition of 
literals in the XML Schema Datatypes specification [4] is assumed. Variables a,b... and x,y ... represent 
place holders for names and literals respectively. 

A triple consists of three components: the subject, the predicate and the object, which is written 
{subject predicate object). The subject is related by the predicate to the object, similarly to simple sen- 
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Figure 1: The syntax of constraints (<p), queries (U) and processes (P), over triples (C). 

tences in English of form subject-verb-object, where URIs and literals are used instead of words. The 
syntax ensures that literals can only appear as the object of a triple. The example below presents two 
RDF triples. 

(Z>4 home starr.uk) (b^ give_name ' Ringo ' ) 

Predicates are names such as home. For instance, the first triple above means that a subject bo, is 
related by predicate home to object starr.uk. The second triple above indicates that subject b$, is related 
by predicate given_name to the literal ' Ringo ' . 

2.2 A syntax for SPARQL queries 

In this section an abstract syntax for queries, Fig.Q] represents the core features of SPARQL Query [24). 
SPARQL Queries are used to read from RDF triples. Synchronisation constructs allow substantial 
queries to be expressed. The syntax of processes, also in Fig. [Q demonstrates how both queries and 
content can be internalised in a process calculus, which suggests a high level language for Linked Data, 
which uses query results. In this model, persistently stored triples are used to answer queries. A stored 
triple is indicated by an underscore. 

Ask queries and multiplicative operators. The simplest 'ask' query provides a triple to be matched. 
There are three multiplicative operators: a tensor product (®) for synchronously joining queries, a par 
operator (>S>) for composing processes in parallel and the operator then (;) for guarding a process with a 
query. The difference between tensor and par is that queries composed using tensor must happen simul- 
taneously (in the same atomic step), whereas processes composed in parallel may be used in different 
atomic steps. Tensor is the implicit join of queries used in SPARQL. Then and par are part of a higher 
level language, where query results are immediately used. These operators are multiplicative since they 
control the sharing of resources. 

The additive operators and select queries. There are three additive operators: choose (©), select (V) 
and the blank node quantifier (/\). The choose operator presents a choice between two queries, hence 
models the SPARQL keyword UNION. The select operator is a quantifier which binds a variable. Select 
is used to model SELECT queries in SPARQL, which discover names and literals. The names and literals 
discovered can also be bound in a continuation process, hence value passing is modelled at a high level. 
Blank node quantifiers provide a model for blank nodes in RDF [16]. A blank node is a local name 
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P >S> _L = P P"SQ = Q^P P^{Q > SR) = {P^ Q) >< 8R 
/\a.± = ± /\a./\b.P = f\b./\a.P f\a.P V Q = /\a.(P >s> Q) a£fn(P) 

Figure 2: The structural congruence over processes. 

where the scope of the blank node is indicated by the scope of the quantifier. Blank nodes allow further 
data structures to be represented in RDF, including XML. 

Constraints and optional queries. A constraint may be used in a query. Constraints form a Boolean 
algebra of basic predicates, such as inequalities and regular expressions. The specification of constraints 
can be found under the keyword FILTER in the recommendation ll24l . A choice between a query and true 
models an optional query in SPARQL, so the keyword OPTIONAL is defined as follows: OPTIONAL U = 

vei. 

Repeated queries and iteration. A common requirement of a query language is that more than one 
result can be obtained. Bounded multiple copies of queries can be synchronously posed, using queries 
with natural number exponents and finite sums. Exponents and sums are just abbreviations defined as 
follows. 

U°±l U n+l ±U®U n T? n=Q U n ±l Kto U " ±K=o u "® uk+i 

A natural number exponent n repeatedly applies the tensor product, so the query must be answered 
exactly n times. The sum with bound n allows the query to be answered between and n times. Sums 
model the keyword LIMIT, such that ULIKnk = Z k n=Q U n . 

Unbounded iteration of queries is indicated by an explicit operator (*), which allows zero or more 
copies of a query to be answered. Note that iteration differs from replication in common process calculi. 
All copies of an iterated query must be answered simultaneously using disjoint resources. 

2.3 A reduction system for the calculus 

The reduction system presents a concise operational semantics for the calculus. The reduction system is 
defined by a structural congruence and a relation over processes called the commitment relation. A fur- 
ther preorder over triples formalises key features of RDF Schema (RDFS [7 ]). RDFS is a light extension 
to RDF, which improves interoperability by resolving aliases between URIs. 

The structural congruence (= in Fig. |2]) is defined such that (P, > S, _L) forms a commutative monoid. 
Alpha conversion can also be applied to blank node quantifiers. Furthermore, blank node quantifiers can 
be eliminated in the presence of nothing, commute and distribute over par. All reductions are considered 
up to structural congruence — as standard in process calculi. 

The commitment relation (> in Fig. [3]) specifies atomic operational steps. The process on the left of 
the commitment relation, becomes the process on the right. A commitment is performed atomically. 

Working with aliases for URIs is a key problem in Linked Data 0. Aliases arise since different 
data sources use different URIs for similar purposes. For instance, in the context of a song, predicate 
lyricist may be more specific than predicate creator (see subPropertyOf in RDFS Q). Similarly, song Q 
and song { may be URIs for the same song (see sameAs in OWL [2]). Hence the aliases lyricist E creator 
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Figure 3: Commitment rules: ask, filter, choose left, choose right, tensor, weakening, dereliction, con- 
traction, select name, select literal, guard, context, and blank node (fn indicates the free names). 

and song Q E ^ongj may be assumed. The application specific set of alias assumptions is referred to as /?. 
The transitive reflexive closure of f3 gives rise to a preorder (E) over URIs. 

The ask axiom, guard rule and alias assumptions. The following example demonstrates the interac- 
tion of an ask query with a continuation and a stored triple. The axiom 'ask' allows a query triple and 
a stored triple to interact. The stored triple remains available after the commitment. The axiom 'guard' 
makes the continuation process available after the commitment. 

(song Q lyricist bt) *8 {{song x creator b$) ; P) > (song lyricist b^ *8 P 

Above, the conditions for a match are relaxed by the preorder over triples (E). The preorder over 
triples is the point-wise extension of the preorder over URIs introduced above. 



The tensor and select rules. The following example demonstrates two synchronised queries, in the 
presence of two stored triples. The first query poses a pattern to match, while the second query selects a 
name with respect to a pattern. 

{{b 2 role singer) ® \Jb.{{b role guitarist) ; P)) *8 (b 2 role singer) >£ 

{b 2 role singer) *8 {b 3 role guitarist) (b 3 role guitarist) *8 P{ hl /b} 

In the above example, the 'tensor' rule divides the stored triples between the two parts of the query. On 
the left the 'select' rule is applied. The 'select' rule substitutes a suitable URI for the quantified name. 
The result is that a URI is passed to the continuation. 



The choose rule. The following example demonstrates a choice between queries. The 'choose left' 
rule is used in this case. 

\Ja.(((a knows b 2 ) ; P) © ((b 2 knows a) ; Q)) *8 {b\ knows b 2 ) > {b\ knows b 2 ) *8 P{ bl / a ) 



The query result determines the continuation triggered. 
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Constraints in queries. The example query below selects a literal. The data literal appears in a triple 
and a constraint. The rules ensure that both a suitable triple appears and the constraint imposed holds. 

\7x.((W <5)®(&, name x) ; P) >S> (bi name 'John') > {by name 'John') >S> P{ ' John '/*} 

The satisfaction relation for evaluating constraints is left to the W3C recommendation |[24l . Satisfac- 
tion is assumed to define a Boolean algebra of constraints. 

The rules for iteration of queries. The example below demonstrates iteration used to answer two 
copies of the same query. Two iterated queries are answered using 'dereliction', which are combined 
using the conventional tensor rule. The 'contraction' rule then reduces the combined queries to a single 
query. 

*\/c.((c is busy) ; P) >? (b 2 is busy) >S> (b 3 is busy) > (b 2 is busy) *g (63 is busy) >S> P{ fo2 / f } *2 P{ b3 / C } 

A continuation for each result is triggered. Note the 'weakening' rule could be used to allow the query 
to be answered zero times. 

Blank nodes as quantifiers. The example below demonstrates a query which discovers a blank node. 
The 'blank node' rule uses a temporary name to represent the blank node. The result is that the scope of 
the blank node quantifier is extended to include the continuation, which receives the blank node. 

\/c.{{c creator b 2 )\U)*8 a / U{%}>8 

/\a.((a author b 2 ) *8 (a status open)) I \ \ {a author b 2 ) >§> {a status open) 

The alias author E creator is assumed above. The temporary name must not appear in the alias assump- 
tions (/?). The unused stored triple is idled. 

Rules for an additive disjunction, tensor product, existential quantification, universal quantification 
and iteration, are borrowed from Linear Logic [9]. The sequent calculus is extended to indicate a contin- 
uation process, constraints extend the basic units with a Boolean algebra, and a preorder accommodates 
aliases over names. 

3 A labelled transition system for the operational semantics 

The operational semantics can be expressed as a labelled transition system. This provides an alternative 
operational semantics to the reduction system. This alternative semantics allows the behaviour of queries 
and data to be evaluated separately and then composed. Lemma [2] verifies that the labelled transition 
system and reduction system describe the same behaviour. 

3.1 The purpose of labels 

A labelled transition consists of two processes and a label. The first process is the process before the 
transition. The label is a constraint on the context in which a transition can take place. The second 
process is the resulting process after the transition. 

The labels are formed from a commutative monoid over triples (E,®,\). A label indicates the inputs 
and outputs of a process. An input indicates that a process can proceed if it can receive the triples on the 
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Figure 4: Labelled transitions for queries: input triple, trigger guard, tensor, choose left, choose right, 
filter, select name, select literal, weakening, dereliction and contraction. 



label from its context. An output indicates that a process outputs the triple on the label to its context. For 
instance, the query below inputs a triple; while the stored triple below outputs a triple. 

(b 4 knows b 3 );P (t,faw,ti V f (b 4 knows b 3 ) i &4 knows fea V (b 4 knows b 3 ) 

A relevant interpretation is that the first transition above is an action from the perspective of a client 
which resolves a query; whereas the second is an action from the perspective of a server that provides a 
triple. Two processes composed in parallel with matching inputs and outputs may interact. For instance, 
the above processes can be composed, resulting in the following transition. The unit label indicates an 
operational step without side effects. 

(b 4 knows £3) ; P >S? (b 4 knows b 3 ) P *8 (b 4 knows b 3 ) 

Output labels can also indicate extruded names. For instance, the example below extrudes the name 
a. The extruded names represent blank nodes where the scope of the blank node quantifier may be 
extended. This is similar to extrusion of new names in the 7r-calculus |[22l . 

I\a.{a has paper) *g (&2 has stone) — has paper \ (a has paper) *8 Q>2 has stone) 
The commutative monoid rules can always be applied to reorder labels. 



3.2 Labelled transitions for queries 

The input transitions allow the behaviour of a query to be modelled independently. The rules for queries 
are presented in Fig. [4] The rules accumulate RDF triples on an input label, which represents contexts in 
which a query may be answered. 

The 'input triple' rule poses the triple as an input on the label. The triple on the label may be 
strengthened by the preorder over triples. The 'trigger guard' rule allows a continuation process to be 
triggered exposing the continuation. The following example demonstrates a query consisting of a single 
triple and a continuation process, where the preorder colleague E knows is assumed. 

(b 4 knows b 3 ) ; P {b4 co " a " lue fe3 V P 

Select quantifiers are resolved by anticipating the name or literal to input. For instance, the following 
labelled transition indicates that the query can be answered in a context where a name is chosen. The 
same name is passed to the continuation process. 

\/a.{{b 4 knows a) ; P) (h knom p{ b y a ] 
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Figure 5: Process rules: output triple, open, blank node context, par context, parallel outputs and close. 
The symmetric versions of the par context and close rule are also assumed. 

Choices are resolved by anticipating the left or right branch. For instance, the following transition 
indicates the label and continuation which results from choosing the left branch. 

((b 4 knows b 2 ) ; P) © ((b 4 knows b 3 );Q) {b * knows bl) > P 

Tensor synchronises two queries, by composing their respective labels and continuations. For in- 
stance, the following query simultaneously inputs two triples. The continuations of both queries are 
triggered in parallel, with the appropriate substitutions. 

ya.(((b4 knows a);P)®(\/x.(a name x) ; Q)) ih knows hMb2 name ' 3ohn 'V P[ b y a ) >S> g^-' ]obn ' / a , x ] 

A constraint is disposed when it is satisfied. For instance, in the following query the length of a 
selected literal is constrained, but satisfied by the substitution. 

\/x.((b 2 name x)®(\x\ < 5) ;P) (h name ' Jolm 'V P{ John '/ x ] 

Iteration anticipates the number of copies of a query to pose using weakening, dereliction and con- 
traction. For instance, two copies of the following query are posed using contraction and dereliction. 
The label indicates the two separate triples which are to be answered simultaneously. Both continuations 
are composed in parallel. 

* \fa.((b 4 knows a) ; P) ih knows b2Mb4 knows fca V P{ b2 / a ] >S> P{ h >/ a 
The rules of the labelled transition system are sufficient to model queries. 



3.3 Labelled transitions for an RDF store 

The behaviour of stored RDF triples can be modelled using output labels. The rules of output labels are 
presented in Fig. [3] The names extruded on the label are indicated by a, where + indicates disjoint union 
of names. The abbreviation f\a.P is used to indicate the quantification of all names in a. 

Stored triples can output the triple on the label. The same triple appears in the continuation un- 
changed. The preorder over names may be used to weaken the output triple. Names are extruded on 
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the label using the 'open scope' rule. For instance, the following triple outputs a triple and extrudes the 
blank node, using the assumption colleague E knows. 

I\b 4 .{b 4 colleague b 3 ) b4 ^ h * knows fe3 V (b 4 colleague b 3 ) 

Output labels composed in parallel can be combined. Extruded names on both labels must be disjoint 
to preserve the scope of blank nodes. For instance, the following transition simultaneously outputs two 
triples and extrudes three names. 

A^4-(A^2-(^4 knows bi) 1? [\b 3 .{b 4 knows b 3 )^ 

b 2 ,b 3 M(hTnon, b 2mb4 knows ^ w ^ ^ ^ ^ ^ 

Two parallel processes may interact using the close rule. Close allows complementary inputs and 
outputs to be matched. Names extruded on the output label are introduced as quantifiers in the contin- 
uation. Any inputs not answered remain on the resulting label, to be answered later. For instance, the 
following iterated query is answered twice. One copy is answered by the available process and the other 
copy must be answered by the context for the transition to occur. In the continuation, the scope of the 
blank node is extended. 

*\/a.{{b 4 knows a);P)*8 Ab 3 . (b 4 knows b 3 ) {h * knowsb \ ^b 3 .(p{ h y a } *g P{ h 'l c ] >S> (b 4 knows b 3 )) 

The context rule for parallel composition allows a process which does not contribute to an interaction 
to idle. Similarly, the context rule for blank node quantifiers allows a blank node to be ignored in a 
transition if it does not appear on the label. 

3.4 Comparison of the two operational semantics 

To justify the labelled transition system, the labelled transitions are compared to the reductions of the 
reduction system. If a unit labelled transition can be derived then the corresponding reduction can also 
be derived. The significance is that, given the independent perspectives of the query and the store in 
terms of labelled transitions, their combination satisfies the global perspective specified by the reduction 
system. 

Scope extrusion presents technical difficulties. The following technical lemma reduces these diffi- 
culties, by eliminating scope extrusion. The proof demonstrates that combinations of opening names and 
closing names can be eliminated from a proof tree which uses an extruded name. 

Lemma 1 (Elimination of extrusion). Suppose that a labelled transition proof uses name extrusion, but 
not in the conclusion. The same labelled transition, up to structural congruence, holds without any name 
extrusion. 

Note that full proofs for all theorems are provided in the thesis of the first author lfl4l . 
Every completed labelled transition can also be expressed as a reduction, Lemma|2] The proof works 
by transforming proof trees so that labels used in interactions are eliminated. 

Lemma 2 (Elimination of labels). P Q if and only if P> Q. 

Thus the local perspective of the labelled transition system and the global perspective of the reduction 
system specify the same operational capabilities. 
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4 An algebra for the syndication calculus 

In this section bisimulation is introduced as the natural notion of equivalence over the labelled transition 
system. Bisimulation is demonstrated to be sound with respect to equivalence in the reduction system. 
Thus every pair of bisimilar processes are equivalent with respect to the natural notion of equivalence 
over the reduction system. Bisimulation is then used to verify an algebra over queries and processes. 

4.1 Bisimulation 

Processes which are capable of the same observable behaviour can be regarded as equivalent. The 
observable behaviour of a process is given by the labels of the labelled transition system. Observational 
equivalence of processes is established using the technique of (strong) bisimulation, as follows. 

Definition 1 (Bisimulation). Bisimulation, written ~, is the greatest symmetric relation such that the 
following holds, for any label I. If P ~ Q and P P' then there exists some Q' such that Q Q' and 
P'~Q'. 

The following verifies that bisimulation is a congruence — a relation which holds in any context. It 
is necessary that bisimulation is a congruence for it to be used as an algebra. A context is a process with 
a place holder for some syntax. 

Lemma 3 (Bisimulation is a congruence). If P ~ Q and C is a context, then CP ~ CQ. 

An alternative notion of equivalence is defined using the reduction system. Contextual equivalence 
is used in related work to justify notions of bisimulation on the 7r-calculus and ambient calculus lfl"5ll2"Tll . 

Definition 2 (Contextual equivalence). Contextual equivalence, written -, is the greatest symmetric, 
reduction closed, context closed relation. A relation K is reduction closed iff P KQ and P>P' then 
there exists some Q' such that Q> Q' and P' K Q'. A relation K is context closed iff PR Q yields that 
CP K CQ, for all contexts C. 

Bisimulation is sound with respect to contextual equivalence. Soundness is essential to justify the 
chosen notion of bisimulation. 

Theorem 1 (Bisimulation is a contextual equivalence). If P ~ Q then P - Q. 

Proof. Reduction closure follows from Lemma [2] and context closure follows from Lemma [3] □ 

Soundness of bisimulation ensures that algebraic properties proven using bisimulation also hold for 
contextual equivalence. Bisimulation simplifies proofs in the following section. Note that complete- 
ness (contextual equivalence is a bisimulation) is not required for this work. Completeness can only be 
achieved in an extended version of the calculus. 

4.2 Algebraic properties of queries 

Using bisimulation as an equivalence, key properties of queries are established. This section amounts 
to a soundness proof of the algebraic properties established. Thus if any two process are equivalent 
according to the algebraic properties then they are bisimilar; and furthermore, by Theorem [Q they are 
contextually equivalent. 

For the labelled transition system, structural congruence is not assumed, hence verified here. The 
proof for the distributivity of blank node quantifiers over par requires extensive case analysis. The case 
of associativity of par follows from distributivity of blank node quantifiers. Proofs are similar to the 
analogous bisimulations in the 7r-calculus E2ll . 
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Proposition 2. The structural congruence (Fig$2ty is a bisimulation. So, {P,^8 , ±) forms a commutative 
monoid. Blank node quantifiers annihilate with ±, commute, and distribute over *£. 

Bisimulation reveals some canonical algebraic properties of queries. Firstly, queries form an idem- 
potent semiring. Semirings are ubiquitous in computer science. A notable feature of semirings is that the 
ideals of a semiring form a semiring. 

Proposition 3. (£7,®,©, 1,0) is a commutative idempotent semiring. That is, (t/,®,I) is a commutative 
monoid, (t/,ffi,0) is idempotent commutative monoid. ® distributes over © and annihilates with ®. 

Idempotent semirings have a natural preorder, given by U < V iff U © V ~ V. Hence queries have 
this natural preorder. An immediate consequence is that choice is a colimit, i.e. least upper bound, of 
two queries. 

Proposition 4. Choice is a colimit of its branches. That is, V < W and U < W, if and only ifV@U <W. 

The preorder over queries can be used to optimise queries. If a query offers a choice between a query 
and a weaker query, with respect to the preorder, the stronger branch may be eliminated. For instance, 
in related work (231, is is claimed that U OPTIONAL (V OPTIONAL W) is not the same as (U OPTIONAL 
V) OPTIONAL W. Under the interpretation of OPTIONAL in the calculus it holds that £/®((V®(W©I))© 
I) < U ®((V ©I)®(W©I)), by distributivity, commutativity and idempotency. So the first is a stronger 
query. 

A single rule is sufficient to capture the algebra of the select quantifier. From this algebra common 
equalities can be derived. The derived rules are suitable for the optimisation technique of flattening 
nested selects used in relational algebra [8]. The proof of commutativity of quantifiers requires capture 
avoiding substitution to be assumed. The presence of the tensor in the rule is required to prove that 
\Ja.U®V< \Ja.{U®V), when a £ fn(V). 

Propositions. Selects are colimits of substitutions. So, u{ b / a }®V<Wforallb,ifandonlyif\Ja.U®V< 
W. Immediate consequences are that, select commutes, distributes over choice, is annihilated by true 
and distributes over tensor. Furthermore, alpha conversion of bound variables is verified. 

The following rules of regular algebra hold. The first of the rules is sufficient to demonstrate that 
* V® U is a fixed point of the (monotone) map W h-» C/ffi(V® W). The second rule demonstrates that *V ® 
U is the least such fixed point. Historically, Redko demonstrated that no finite collection of equations 
could axiomatise iteration |[25l . The formulation below, was proven to be complete by Kozen [19]. 

Proposition 6. An iterated query expands as follows *U ~ 1®(U <8>*U). Furthermore, ifU®(V®W) < W 
then *V® U < W. 

A classic consequence of the above is that queries without select can always be denested to a single 
iteration ll20l . However, select breaks denesting since iteration and select do not commute. For instance 
the following query requires two iterations. The result is that for each of the first continuation triggered, 
zero or more instances of the second continuation are triggered. This query can be expressed using 
sub-queries in the current SPARQL Query working draft [10]. 



Iteration can be expressed as a colimit of repeated queries. This is a strictly more general property 
than Proposition [6] [ 18]. Since all constructs are colimits which distribute over tensor, the ideals gen- 
erated by queries form a (commutative) quantale, as exploited by Montanari, Hoare and others |T51,[T2"1. 
Quantales are related to spectral theory, which is related to information retrieval techniques used by 
search engines. Clarification of this connection is future work. 



*\/ a.\/ n.(((a name n);P)®*\ e.({a email e) ; Q 
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Proposition 7. Iteration is a colimit of powers of queries. So, U" ® V < W for all n, if and only if 

*u®v<w. 

Kozen demonstrates that Boolean algebras can be embedded in Kleene algebras EOll . The 'tests' 
of Kozen correspond to 'constraints' in SPARQL. Bisimulation verifies that the Boolean algebra of 
constraints embeds in the Kleene algebra, in the same manner, with similar consequences. 

Proposition 8. The Boolean algebra of constraints embeds in queries. Using standard classical impli- 
cation, <p=> if/ if and only if(p<\]s. Or is choice, and is tensor, exists is select and an iterated constraint 
is always true. 

As with classical implication, the preorder over triples can be embedded in the partial order over 
processes. However, since alias assumptions are only a preorder, if C ~ D then it holds that C E D and 
DEC, which is weaker than equality. Maintaining distinction of names is important for applications 
where B is not fixed over time. 

Proposition 9. C E D if and only if C < D. 

The multiplicatives then, par and times and the units are related in the following manner. Combined 
with the previous rules the properties of then are established. The second rule shows that 'then' can be 
replaced by the unit delay (as in ifTTO. 

Proposition 10. An empty continuation can be removed, a continuation can be decomposed into the 
guard and a unit delayed process, and two continuations can be combined in a single par continuation, 
as follows. 

I;j_~I U®(I;P)~U;P (U ;P);Q ~ V ;{P *8 Q) 

The algebra can be applied to optimise queries for distribution. In the example below the first query 
is rewritten as the tensor product of two queries. 

*\Ja.({(a knows b2)',P)®((a knows bj) ; Q)) ~ *\Ja.((a knows b-£) ;P)® *\Ja.((a knows bj,) ; Q) 

The second query above is better for distribution. The tensor product allows two smaller queries to be 
immediately evaluated in parallel. The tighter scope of the select quantifiers reduces the branching when 
potential values to select are considered. The distribution of queries across clusters of servers is a major 
problem for processing Linked Data iPTO . 

5 Conclusion 

The calculus introduced provides the first operational semantics for SPARQL Query — a W3C recom- 
mendation for querying Linked Data. The calculus has a concise logical semantics defined by a reduction 
system. The power of the calculus lies in the synchronisation primitives for queries. The synchronisa- 
tion primitives are required to match the expressiveness of the core of SPARQL Query. Queries are 
internalised in a high-level process calculus, where query results determine continuation processes. 

An alternative labelled transition system is shown to match the expressive power of the reduction 
system. Furthermore, the notion of bisimulation in the labelled transition system is sound with respect 
to equivalence in the reduction system. Bisimulation is used to verify an algebra over queries, which 
extends existing notions of an algebra for SPARQL Query. An algebra of queries is useful when tackling 
problems associated with Linked Data, such as distributed query planning. 
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The operational semantics combines several formalisms, as expected for a real language. The queries 
form a semiring, which provides a natural partial order. This partial order is used to characterise choice, 
selects and iteration as colimits. Also, iteration is the least fixed point of a monotonic map over queries, 
hence queries form a Kleene algebra. A preorder over URIs allows small permissible mismatches be- 
tween content and queries to be resolved, capturing key features of the RDFS standard. Also, a Boolean 
algebra of constraints is naturally embedded in queries, to provide further control. The calculus demon- 
strates that key features of SPARQL and related standards for Linked Data can be tightly integrated in 
one framework. 
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